Hyperfeatures - Multilevel Local Coding for Visual Recognition

نویسندگان

  • Ankur Agarwal
  • Bill Triggs
چکیده

Histograms of local appearance descriptors are a popular representation for visual recognition. They are highly discriminant and they have good resistance to local occlusions and to geometric and photometric variations, but they are not able to exploit spatial co-occurrence statistics of features at scales larger than their local input patches. We present a new multilevel visual representation, ‘hyperfeatures’, that is designed to remedy this. The basis of the work is the familiar notion that to detect object parts, in practice it often suffices to detect co-occurrences of more local object fragments – a process that can be formalized as comparison (vector quantization) of image patches against a codebook of known fragments, followed by local aggregation of the resulting codebook membership vectors to detect co-occurrences. This process converts collections of local image descriptor vectors into slightly less local histogram vectors – higher-level but spatially coarser descriptors. Our central observation is that it can therefore be iterated, and that doing so captures and codes ever larger assemblies of object parts and increasingly abstract or ‘semantic’ image properties. This repeated nonlinear ‘folding’ is essentially different from that of hierarchical models such as Convolutional Neural Networks and HMAX, being based on repeated comparison to local prototypes and accumulation of co-occurrence statistics rather than on repeated convolution and rectification. We formulate the hyperfeatures model and study its performance under several different image coding methods including clustering based Vector Quantization, Gaussian Mixtures, and combinations of these with Latent Discriminant Analysis. We find that the resulting high-level features provide improved performance in several object image and texture image classification tasks. Key-words: Computer vision, Visual recognition, Image coding, Image classification ∗ GRAVIR and INRIA Rhône Alpes, Email: [email protected] † GRAVIR and CNRS, Email: [email protected] Hyperfeatures — une representation hierarchique locale pour la reconnaissance visuelle Résumé : Caractériser le contenu d’images de façon robuste et discriminante reste un défi majeur pour la reconnaissance visuelle. Une approche prometteuse consiste à évaluer un jeu de descripteurs visuels locaux invariants sur un ensemble de régions extraites de l’image, et de caractériser leur statistique – et donc le contenu de l’image – par biais de leur histogramme de valeurs quantifiés. Cette méthode « sac de descripteurs » décrit bien l’apparence locale et elle résiste aussi aux occultations et aux déformations géométriques et photométriques, mais elle a du mal à encoder la structure géométrique du scène. Nous présentons une nouvelle représentation visuelle, les « hyperfeatures », qui remédient ce défaut. L’idée de base est de coder non seulement la distribution marginale de classes d’apparence des régions, mais aussi leurs co-occurrences locales, et ceci a plusieurs reprises. À partir des vecteurs de descripteurs locaux de chaque région, la méthode les quantifie et les cumule sur des super-régions locales afin de créer un histogramme d’apparences local pour chaque super-région. Chaque histogramme étant de nouveau un vecteur de descripteurs local, ce procède peut être répété plusieurs fois, à chaque reprise créent sur une région plus grande une indice de niveau plus élevé qui code les co-occurrences des « sous-parties de scène » issues du niveau inférieur. On hypothèse que plus le niveau est élevé, plus ces indices deviennent « sémantiques » et caractérisent le contenu haut-niveau de l’image. Nous décrivons le modèle hyperfeatures et étudions ses performances sous plusieurs méthodes de codage de descripteurs : quantification vectorielle, mélanges gaussiennes, et analyse latente discriminante. Nos expériences de classification d’images et de textures et de localisation d’objets, démontrent qu’incorporer les hyperfeatures améliore la performance de la méthode de base. Mots-clés : Vision par ordinateur, reconnaissance visuelle, caractérisation et classification d’images Multilevel Local Coding for Visual Recognition 3

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilevel Local Coding for Visual Recognition

Histograms of local appearance descriptors are a popular representation for visual recognition. They are highly discriminant and they have good resistance to local occlusions and to geometric and photometric variations, but they are not able to exploit spatial co-occurrence statistics of features at scales larger than their local input patches. We present a new multilevel visual representation,...

متن کامل

Multilevel Input Ring-Tcm Coding Scheme: a Method for Generating High-Rate Codes

The capability of multilevel input ring-TCM coding scheme for generating high-rate codes with improved symbol Hamming and squared Euclidean distances is demonstrated. The existence of uniform codes and the decoder complexity are also considered.

متن کامل

Recognition of Visual Events using Spatio-Temporal Information of the Video Signal

Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...

متن کامل

Image Compression Using Adaptive Multilevel Block Truncation Coding

The block truncation coding (BTC) algorithm for image compression has the advantages of low computation load and less memory requirement. In this paper, an adaptive image compression algorithm using multilevel BTC is proposed. An input image is partitioned into blocks with variable sizes, and the gray values of each block are adaptively quantized to be one, two, or four levels according to loca...

متن کامل

Spatially Local Coding for Object Recognition

The spatial pyramid and its variants have been among the most popular and successful models for object recognition. In these models, local visual features are coded across elements of a visual vocabulary, and then these codes are pooled into histograms at several spatial granularities. We introduce spatially local coding, an alternative way to include spatial information in the image model. Ins...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006